Abstract
Introduction Chimeric Antigen Receptor (CAR) T-cell therapy has revolutionized the treatment of relapsed or refractory large B-cell lymphoma (LBCL), yet predicting long term remissions remains a significant clinical challenge. This study aimed to develop machine learning (ML) models trained on real-world data (RWD) that predict progression-free survival (PFS) based solely on clinical and inflammatory factors available at leukapheresis. The goal is to assist clinicians in optimizing CAR T-outcome to reduce the risk of ineffective treatments, also mitigating the costs associated with this therapy.
Methods We analyzed prospectively collected RWD from 1309 LBCL patients treated with anti-CD19 CAR T therapy across 23 institutions since 2019, enrolled in the Italian multicenter prospective CART-SIE study. Of the 1309 patients, 779 were included in the analysis after excluding those diagnosed with Mantle Cell Lymphoma (MCL), those receiving lisocabtagene maraleucel, or treated with CAR T in second line, and those for whom PFS could not be computed. Eligible patients had Diffuse Large B-cell Lymphoma (DLBCL, n=508), Primary Mediastinal B-cell Lymphoma (PMBCL, n=85), High-Grade B-cell Lymphoma (HGBCL, n=126), or transformed Follicular Lymphoma (tFL, n=50), and received tisagenlecleucel (n=345) or axicabtagene ciloleucel (n=431) as third-line therapy or beyond. A small subset (~8%) from a single center was used for external validation. The remaining cohort was split into training and test sets (80–20%) using stratification based on PFS status. Five ML survival models, with continuous PFS as the outcome, were trained using 22 pre-leukapheresis clinical and inflammatory variables. Feature selection, hyperparameter tuning, and stratified cross-validation (CV) were conducted on the training set. Explainability was assessed using SHapley Additive exPlanations (SHAP), enabling interpretation of both global and patient-specific predictions.
Results All models consistently identified LDH, bulky disease, hemoglobin (Hb), and prior autologous stem cell transplant (ASCT) as key predictors. The Extra Survival Trees model, trained on univariately selected features, achieved a concordance index (C-index) of 0.68, 0.67 (±0.05), and 0.68 on the training, CV, and test sets, respectively and was selected as the best-performing model. SHAP analysis on test set predictions confirmed known prognostic factors, including elevated LDH, bulky disease, low Hb and platelet count, high ferritin, and absence of ASCT, as associated with poorer outcomes. Although external validation performance was lower (C-index = 0.55), likely due to differences in follow-up (median follow-up 6 vs. 19 and 17 months in train and test sets, respectively) and censoring, the model showed stable results on internal test data, providing support for its generalizability.
Conclusion An ML model based on a limited set of routinely available pre-leukapheresis variables (bulky disease, LDH, ASCT, Hb, ferritin levels, platelet count) can predict PFS in LBCL patients undergoing CAR T therapy. While further validation is needed, especially across heterogeneous centers, the model's simplicity and interpretability make it promising for clinical integration. A clinician-facing ML tool derived from this work will be presented and could allow real-time risk prediction using six key inputs, with transparent SHAP-based visual explanations to support personalized treatment planning.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal